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ABSTRACT 

Growing interest in 21 cm tomography has led to the design and construction of broad- 
band radio interferometers with low noise, moderate angular resolution, high spectral 
resolution, and wide fields of view. With characteristics somewhat different from tradi- 
tional radio instruments, these interferometers may require new calibration techniques 
in order to reach their design sensitivities. Self-calibration or redundant calibration 
techniques that allow an instrument to be calibrated off complicated sky emission 
structures are ideal. In particular, the large number of redundant baselines possessed 
by these new instruments makes redundant calibration an especially attractive op- 
tion. In this paper, we explore the errors and biases in existing redundant calibration 
schemes through simulations, and show how statistical biases can be eliminated. We 
also develop a general calibration formalism that includes both redundant baseline 
methods and basic point source calibration methods as special cases, and show how 
slight deviations from perfect redundancy and coplanarity can be taken into account. 



1 INTRODUCTION 

Technological advances of recent years have enabled the design and construction of radio telescopes with broadband frequency 
coverage, low instrumental noise, moderate angular resolution, high spectral resolution, and wide fields of view. These in- 
struments have generated considerable scientific interest due to their unique potential to do 21 cm cosmology, mapping the 
high-redshift universe using redshifted radio emission from neutral hydrogen. Not only would this provide the only direct 
measurements of the cosmological dark ages and the Epoch of Reionization ( [Madau et al. 1997 Tozzi et al. 2000a|b Iliev 



let al.|2002| [Furlanetto et al.|[2004l |Loeb fc Zaldarriaga|2004l [Furlanetto et al.|[2004l [Barkana fc Loeb|[2005l [Furlanetto et al 
120091), but it also has the potential to overtake the cosmic microwave background as the most powerful cosmological probe, 



measuring cosmological parameters related to inflation, dark matter and neutrinos to unprecedented accuracy ( McQuinn et al 



2006[ [Santos fc Cooray|2006[ [Bowman et al.|2007||Mao et al.|2008[ [Furlanetto et al.|2009[) and improving provide dark energy 



constraints after reionization through the detection of baryon acoustic oscillations (Wyithe et al. 2008 Chang et al 
Morales fc Wyithe[[2009| ). 



2008 



From a practical standpoint, however, none of the aforementioned applications will be feasible without reliable algorithms 
for instrumental calibration. Compared to traditional radio astronomy experiments, instruments designed for 21 cm tomogra- 
phy will require particularly precise calibration because of the immense foreground subtraction challenges involved. Epoch of 
Reionization experiments, for example, will be attempting to extract cosmological signals from beneath foregrounds that are 
of order 10'* times brighter ( de Oliveira-Costa et al.|2008 1. Such procedures, where one must subtract strong foregrounds from 
a bright sky to uncover weak cosmological signals, are particularly susceptible to calibration errors. A prime example of this 



is the effect that a miscalibration has on the subtraction of bright, resolved point sources. Datta et al. (20091, for instance. 



argued that bright point sources must be localized to extremely high positional accuracy in order for their subtraction to 



be successful, and Liu et al. (20091 showed that failure to do so could comprise one's subsequent ability to subtract any 



unresolved point sources, making the detection of the 21 cm cosmological signal difficult. This places stringent requirements 
on calibration accuracy. 

While standard algorithms are available for calibrating radio interferometers, the unprecedented challenges faced by 
21 cm tomography and the advent of a new class of large radio arrays (such as PAPER ( Parsons et al.|20T0 1 , LOFAR ( Garrett 
2009[ ), MWA ( [Lonsdale et al.|2009[ ), CRT ( [Peterson et al.|2006[ ), the Omniscope ( [Tegmark fc Zaldarriaga|2009b[ ), and the SKA 
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i Carilli fc Rawlings|[2004l)) mean that it is timely to consider the relative merits of various algorithms. For example, several 
authors ( Bhatnagar et al.|2008 Morales fc Matejek|2009 1 have pointed out that these new high precision radio arrays require 
close attention to be paid to heterogeneities in the primary beams of different antenna elements. In this paper, we focus on 
the technique of redundant calibration, which takes advantage of the fact that these new telescopes generically possess a large 
number of redundant or near-redundant baselines. This property of exact or near-redundancy is not an accident, and arises 
from the science requirements of 21cm tomographjQ where the demanding need for high sensitivity on large spatial scales 
calls for arrays with a large number of short baselines ( |Morales|2005 Lidz et al.|2008 1. Since these arrays typically also have 
very wide fields-of-view and in many cases operate at frequencies where all-sky survey data is scarce, the fewer assumptions 
one needs to make about the structure of the sky signal, the better. This is an advantage that redundant calibration has over 
other calibration methods — no a priori assumptions need to be made about calibrator sources. 

The use of redundant baselines for calibration goes far back. For example, [Noordam fc de Bruyn|([l982 1 used this approach 



to produce high dynamic range images of 3C84, and similar techniques have been used in Solar imaging (Ishiguro 1974 



et al. 19991. The methods used are described in detail in Wieringa (19921; Yang (19881; Pearson fc Readhead (19841, and 



Ramesh 



closely resemble the logarithmic implementation we review in Section [2. 3| In this paper, we build on this previous work and 
extend it in several ways: 

(i) We provide a detailed examination of the errors and biases involved in existing redundant baseline methods. 

(ii) We introduce methods to mitigate (or, in the case of biases, eliminate) these problems. 

(iii) We show how slight deviations from prefect redundancy and coplanarity can be incorporated into the analysis, 

with the view that it may be necessary to correct for all these effects in order to achieve the precision calibration needed for 
precision cosmology. 

The rest of the paper is organized as follows. After defining our notation, we describe the simplest possible calibration 
algorithm {i.e. point source calibration) in Section 2.1 In Section 2.3 we review the redundant baseline calibration proposed 
by Wieringa ( 1992 1. Readers familiar with radio interferometer calibration techniques are advised to skip to Section 2.4 where 



we examine Wieringa's scheme more closely by deriving the optimal weightings for its fitting procedure. In Section [2.5| we 
focus on simulations of redundant calibration for monolithic square arrays, which show that existing redundant calibration 
algorithms have certain undesirable error properties. These properties suggest a linearized variation that we propose in Section 
2.7[ In Section [S] we consider redundant calibration algorithms and point source calibration schemes as extreme cases of a 
generalized calibration formalism, which allows us to treat the case of near-redundant baseline calibration. We summarize our 
conclusions in Section [D 



2 CALIBRATION USING REDUNDANT BASELINES 

We begin by defining some basic notation. Suppose an antenna i measures a signal Si at a given instanlj^ This signal can be 
written in terms of a complex gain factor gi, the antenna's instrumental noise contribution Ui, and the true sky signal Xi that 
would be measured in the limit of perfect gain and no noise: 

Si = QiXi + m. (1) 

Each baseline measures the correlation between the two signals from the two participating antennas: 

Cij = (siSj) (2a) 

= g* gj{xi Xj ) + {n'i Uj ) + g* {x* Uj ) + gj {nlxj) (2b) 

* / * \ I res * I res /o \ 

= gigj{XiXj) + = gigjyi-j + , (2c) 

where we have denoted the true correlation {x*Xj) by yi-j, and the angled brackets (...) denote time averages. The index 
i — j is not to be taken literally but is intended to signify the fact that any given correlation should depend only on the relative 
positions of the two antennas. As is usual in radio interferometry, we have assumed that noise contributions from different 
antennas are uncorrelated, and that instrumental noise is uncorrelated with the sky signal. This means that {n*Xj) reduces 
to {n*){xj), and since (ui) — >■ in the limit of long integration times, we can assume that the last three terms in Equation 
|2b| average to some residual noise level n' '^*. The size of n'~''" will of course depend on the details of the instrument, but in 
general will be on the order of Tsys/V tAu, where Tsys is some fiducial system temperature, r is the total integration time, 

^ Note that we are not advocating the redesign of arrays to achieve redundancy for the sake of redundant baseline calibration. The 
design of an array should be driven by its science requirements, and the ability to use redundant calibration should simply be viewed as 
an option to be explored if there happens to be redundancy. 

More precisely, Si refers to frequency domain data at a given instant. In other words, it is the result of dividing the time-ordered data 
into several blocks, each of which is then Fourier transformed in the time direction to give a signal Si that is both a function of frequency 
and time (albeit at a coarser time resolution than the original time-ordered data). 
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and Au is the bandwidth. For the purposes of this paper, the details of the noise term do not matter, and for notational 
convenience we will drop the superscript "res" from now on. The key qualitative results of this paper (such as the fact that 
many existing redundant baseline calibration schemes are biased) do not depend on the noise level. 

It should be noted that the equation for dj presented above does not represent the most general form possible for the 
measured correlations. So-called non-closing errors, for example, can result in gain factors that do not factor neatly on an 
antenna-by-antenna basis. In this paper we will forgo a discussion of these errors, with the view that a good number of these 
contributions can be mitigated (although not eliminated completely) by good hardware design. Alternatively, our results may 
be interpreted as best-case scenario upper limits on calibration quality. 

Since the gain factors in Equation |2c] are in general complex, we can parameterize them by an amplitude gain -q and a 
phase (p by letting gi = e''»+"^'. Our equation then becomes 

= exp [{r/i + rij) + K'Pj - 'Pi)] Ui-j + ^ij- (3) 
The goal of any radio interferometer is to extract the true correlations yi-j (from which one can construct a sky map using 
Fourier transforms) from the measured correlations Cij. Formally, however, this is an unsolvable problem for the generic case, 
as one can see from Equationjs] — with A'" antennas, one has N{N — l)/2 measured correlations (cij's), which are insufficient 
to solve for the A'^(A'^ — l)/2 true correlations (j/i_j's) in addition to the unknown rj's and i/s's. (The Uij noise terms 
are also unknown quantities, but are generally not treated as ones that need to be solved for, but as noise to be minimized 
in a least-squares fit). In abstract terms, all calibration schemes can be thought of as methods for reducing the number of 
unknowns on the right hand side of Equation [S] so that the system of equations becomes overdetermined and the rj's and ip's 
can be solved (or fit) for. 



2.1 Basic Point Source Calibration 

The most straightforward way to calibrate a radio telescope is to image a single bright point source. A point source has 
constant amplitude in tin-space, and thus is a complex number whose complex amplitude is independent of baseline. 
Moreover, the phase of yi-j must also be zero, since by definition we are only "looking" at the bright point source if our radio 
array is phased so that the phase center lies on the source. Thus, yi-j = S, and Equation [S] reduces to 

dj = exp [(j7i + iqj) + i{<pj ~ (fii)] S + rnj, (4) 

which is an overdetermined set of equations for all but the smallest arrays, since we have N{N — l)/2 complex inputs on the 
left hand side but only 2A'^ real numbers (?7's and ip's) to solve for on the right hand side. The system is therefore solvable, 
up to two intrinsic degeneracies: the overall gain "^^rji is indeterminate, as is the overall phase '^^ipi■ These degeneracies, 
however, can be easily dealt with by having some knowledge of our instrument and our calibration source. For instance, the 
overall gain can be computed if the calibrator source is a catalogued source with known flux density. 

It is important to note that the point source calibration method we have presented in this section represents only the 
simplest, most straightforward way in which one could calibrate a radio telescope. There exist far more sophisticated method^ 
such as the many self-calibration schemes that are capable of calibrating a radio telescope off most reasonable sky signals. In 
practice, one almost never needs to use basic point source calibration, for it is just as easy to use a self-calibration algorithm. 
The scheme described here should therefore be considered as no more than a "toy model" for calibration. 



2.2 Redundant Baseline Calibration 



The main drawback of the approach described in the previous section is that it requires the existence of an isolated bright 
point source. This requirement becomes increasingly difficult to fulfill as cosmological science drivers push radio telescopes 
towards regimes of higher sensitivity and wider fields of view. While self calibration methods have been shown to produce high 
dynamic range maps over wide fields of view and do not require the existence of isolated bright point sources, they are reliant 
on having a reasonable a priori model of the sky. Thus, if possible it may be preferable to opt for redundant calibration, 
which is completely model independent. With the emergence of telescopes with high levels of redundancy (such as the Cylinder 
Radio Telescope — see Peterson et al. ( 2006 1 — or any omniscope^like those described in Tegmark & Zaldarriaga ( 2009a b \) 



redundant baseline calibration becomes a competitive possibility. 

Mathematically, if an array has a large number of redundant baselines, then one can reduce the number of unknowns on 
the right hand side of Equation|3]by demanding that the true visibilities yt-j for a set of identical baselines be the same. Since 
the number of measured visibilities dj's stays the same, our system of equations becomes overdetermined provided there are 
a sufficient number of redundant baselines, and it becomes possible to fit for the rj and ip parameters. 



^ See Cornwell & Fomalont (19991 or Rau et al. (20091, for example, for nice reviews. 

* We detine an ommscope as any instrument where antenna elements are arranged on a regular grid, and full digitized data is collected 
on a per element basis without any beamforming. 



4 Adrian Liu, Max Tegmark, Scott Morrison, Andrew Lutomirski, Matias Zaldarriaga 



As an example, consider a one-dimensional array of five radio antennas where the antennas are equally spaced. In this 
case, one has four unique baselines (of lengths 1, 2, 3, and 4), and 10 measured correlations. Thus, one must fit for 18 real 
numbers (four complex numbers from the true visibilities of the four unique baselines, plus five rj^s and five Lp's) from 20 
numbers (10 complex measured correlations): 

Ci,2 = ex.p[{rii + ri2) + i{ip2 — ipi)]yi + ni,2 
C2,3 = exp [{ri2 + r)-i) + i^-^-i — (^2)] J/i + n2,3 

: (all four baselines of length 1) 

Ci,3 = exp [(ryi + J73) + i(v'3 — Vi)] 2/2 + ni,3 

C2,4 = exp [(772 + 774) + 4(9^4 — ^2)] y2 + n2,4 (5) 

: (all three baselines of length 2) 

Ci,4 = exp [(ryi + 774) + i(((24 " <y5l)] 2/3 + 711,4 

: (both baselines of length 3) 

Cl,5 = exp [(7?i + 775) + 7(^35 - 7/4 + 711,5. 



As is usual in radio interferometry, we have omitted equations corresponding to baselines of length zero, i.e. autocorrelations 
of the data from a single antenna. This is because autocorrelation measurements carry with them correlated noise terms, and 
thus are likely to be much noiser than other correlated measurements. 

In the following sections we will present two methods for explicitly solving Equation [5] We begin in Section [2. 3| with a 
logarithmic method, which is very similar to one described by Wieringa ( 1992 1. This method is the simplest way to implement 



a redundant baseline calibration scheme, but suffers from several problems which we detail in Section [2.5| and [2.6| In Section 
|2.7| we show how these problems can be solved with an alternative method based on linearization. 



2.3 The Logarithmic Implementation 

We proceed by rewriting our equations in the following way: 

Cij = glgiVi-j ( 1 + — ) ■ (6) 

Taking the logarithm gives 

In dj = 7/i + rij + - V^O + In yi-j + In ( 1 + ) , (7) 

\ yidjyi-j J 

V ' 

and requiring that the real and imaginary parts be separately equal gives linear equations of the form 



In |ci,jl = 77,; + T/j + In \yi-j\ + Re7;;ij 
arg|ci,jj = ipj—ifi + ajg\yi-j\+lra'Wij. 



(8a) 
(8b) 
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Note that the amphtude and phase cahbrations have now decouplecQ and can thus be written as two separate matrix systems. 
The amplitude cahbration, for example, takes the form 
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(9) 



for the one-dimensional, five-element array considered in Section [2.2| The vector d stores the measured correlations, the matrix 
A is completely determined by the array configuration, and the x is what we hope to solve for. 

As it currently stands, however, this set of linear equations has no unique solution even in the absence of noise {i.e. even 
when w = 0). This can be seen from the fact that the vector x„uii = (1, 1, 1, 1, 1, —2, —2, —2, —2) (written here as a row vector 
for convenience) lies in the null space of A {i.e. Ax„u;; = 0), so for any solution xq, one can form a new solution xq -f x,ii,n- 
The new solution corresponds to adding a constant to all ry's, which — since g = exp{ri + i(fi) — is equivalent to multiplying all 
gains by a constant factor and simultaneously dividing all the sky signals by the same factor. Physically, this is an expression 
of the fact that internal calibration schemes {i.e. schemes like this one where the calibration is done by exploiting internal 
mathematical consistencies) are insensitive to the absolute gain of the radio interferometer as a whole, a problem which is 
present even performing a simple point source calibration, as we noted in Section [2. 1[ 

This mathematical degeneracy in absolute amplitude calibration can be broken by arbitrarily specifying an absolute gain 
level. For example, one can add the equation 



: 



(10) 



to the system of equations, thus ensuring that one cannot uniformly elevate the gain levels and still satisfy all constraints. 
The set of equations governing the phase calibration require the addition of a similar equation, since the absolute phase of a 
system is not a physical meaningful quantity and therefore must be arbitrarily specified: 

i 

With the phase calibration, however, there exist two additional degeneracies: the calibration is insensitive to tilts of the entire 
telescope in either the x or the y direction, since such tilts are equivalent to rotations of the sky and redundant algorithms 
are (by design) independent of the nature of the sky signal. To break these degeneracies, one adds the following equations: 







= 



i 

E 



(12a) 
(12b) 



where = {rx,i,ry^i) denotes the physical location of the ith antenna in the array. While these extra equations are somewhat 
arbitrary (the L.H.S. of Equation 10 for instance, can be any real number), the true gain, phase, and tilt parameters can 
always be fixed after-the-fact by referring to published flux densities and locations of known, catalogued bright point sources 
in the final sky maps. 

With these four degeneracies broken, our equations can be solved using the familiar least-squares estimator x for the 
parameter vector: 

x= [A*N"^A]"^ A*N"M. (13) 
Here, N is the noise covariance matrix, which takes the form (RewRew*) for the amplitude calibration and (Imwlmw*) 



^ From a rigorous standpoint, it is not strictly true that the amplitude and phase calibrations of an interferometer decouple. Indeed, 
this is evident even in our simple model, where Rewij and Irawij both contain factors of the complex gain g = e^^^'^ , which means 
that Equations |8a| and |8b| are in principle coupled, even if only weakly so. However, one must remember that the noise parameters are 
treated simply as given numbers which are not solved for when solving these calibration equations. This means that in finding the best 
fit solution to Equations |8a| and |8b| the rj and ip calibrations can be performed separately. It is only in this sense that the systems are 
"decoupled" . 
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for the phase cahbration. In the following section we examine the form of this matrix in detail. Note that since the parameter 
vector X contains not only the calibration factors rj and ip but also the true sky correlations Vi-j, one can solve for the true 
sky signal directly from the uncalibrated correlations d, as one expects from any self or redundant calibration scheme. 

This concludes our review of the Wieringa ( 1992 1 redundant calibration method, cast in notation conducive to the 
development of the new material that follows. It is worth noting that in Wieringa (19921, the noise covariance matrix N is 
set to the identity. In the next section, we show that this is not the optimal weighting to use for the calibration fit, and derive 
the noise covariance matrix that corresponds to inverse variance weighting. 



2.4 The Noise Covariance Matrix 



The form of N will depend on one's model for the instrumental noise. In what follows, we will for convenience assume that 
the noise is Gaussian, small compared to the signal, and uncorrelated between baselines; generalization to other noise models 
is straightforwarcQ Although these assumptions are not necessary for calibration (since Equation 13 can be implemented for 



any invertible choice of N) , making them will allow us to gain a more intuitive understanding of the errors involved. Recalling 
the form of the noise term after taking the logarithm of our equations, we can say 

= In f 1 + , '^'^ ) ^ ^ "'^^ , (14) 

where we have invoked the assumption of high signal {g*gjyi-j) to noise (riij) to expand the logarithm. Since dj = g* QjUi-j + 
riij, we can rewrite this as 

Wij « ^JH^ « (15) 

where we have neglected higher order terms. Equation [15] is more convenient than Equation |14| because it is written in terms of 
the measured correlation Cij instead of the noiseless g*gjyi-j. Since the logarithmic scheme separates the real and imaginary 
parts of the calibration, the crucial quantity is not Wij but rather its real (or imaginary) part; 

Wij ^ vir^x+irLy) (16a) 

= — !— (cos — i sin (/))(n2, + iriy) (16b) 

^ ReiUij — 7 (tij: cos + % sin (?!)) , (16c) 

I Cij 1 

where (ji = axgCij. With this result, we can form the noise covariance matrix N = (RewRew*). Since we are assuming that 
the noise is uncorrelated between different baselines, our matrix must be diagonal, and its components can be written in the 
form = ((Re Wa)'^)5a/3 where S^p is the Kronecker delta, and a and /3 are Greek baseline indices formed from pairs 

of Latin antenna indices. For example, the baseline formed by antennas i = 1 and j = 2 might be labeled baseline a = 1, 
whereas that formed by antennas i = 1 and j = ?> might be given baseline index a = 2. The diagonal components are given 
by 

Nca = (naUa) (17a) 

= j-^-j^ {{uxUx) cos^ (f> + 2 (n^ny) COS (f> sin (f) + (nyUy) sin^ (fi) (17b) 
= <t2 =0 =0-2 

= ^'''^ 

where we have assumed that the real and imaginary parts of the residual noise n are independently Gaussian distributed 
with spread a^. This expression provides a simple prescription for the inverse variance weighting of Equation 13 one simply 



weights each measured correlation by the square modulus of itself (the in the numerator is irrelevant in the computation 
of the estimator x because the two factors of N in Equation [l3] ensure that any constants of proportionality cancel out). 

For the phase calibration, the fact that inverse variance weighting corresponds to weighting by l/|cp has a simple 
interpretation. In Figure [l] we plot various correlations on the complex plane. The green dots represent a set of simulated 
noiseless visibilities, while the red triangles show what the baselines of a noisy 4 by 4 regular square antenna array would 
measure. From the plot, it is clear that for a given amount of noise, visibilities that are closer to the origin of the complex 



^ Typically, the electronic noise will be uncorrelated between different antennas/amplifier chains. This means that the noise correlation 
matrix N for the baseline will be highly sparse but not diagonal; away from its diagonal, it will have nonzero entries for precisely those 
pairs of baselines which have an antenna in common. This sparseness is very helpful, making the computations numerically feasible even 
for massive arrays as described below. 
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Figure 1. A plot of true visibilities (green dots) and measured noisy visibilities (red triangles) on the complex plane. The blue lines 
delineate the phase extent of the noisy visibilities from two sets of redundant baselines. One sees that in the presence of noise, visibilities 
closer to the origin are more susceptible to loss of phase information. 
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Figure 2. Scatter plot of simulated vs. recovered antenna gain parameter rj for a 4 by 4 square array with a signal-to-noise ratio of 10. 

plane (i.e. those with smaller |c|) are more susceptible to loss of phase (argc) information from noise. The gain (log jc|) errors 
are similarly large for such visibilities. These visibilities should be given less weight in the overall fit, which is what the inverse 
variance weighting achieves. 

2.5 Simulation Results and Error Properties 

In Figures [2] and |3] we show simulated calibration results of a 16 antenna element interferometer. The sky signal used in 
the simulations was that of a Gaussian random field. That such a sky is somewhat unrealistic is irrelevant, since redundant 
calibration schemes are sky-independenl[^ by construction. 

That is, provided we exclude unreasonable mathematical exceptions such as a sky devoid of any sources. 
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Figure 3. Scatter plot of simulated vs. recovered antenna phase parameter tf for a 4 by 4 square array with a signal-to-noise ratio of 10. 



Like in Figure [T] the antennas in these simulations were arranged in a 4 by 4 square grid, with the antenna spacings 
assumed to be completely regular. The precise physical separation between adjacent antennas need not be specified, because a 
dilation of the entire array layout changes only the sky signal measured by the interferometer, and not the pattern of baseline 
redundancies. Repeating the simulations with rescaled versions of the same array is therefore unnecessary, again because of 
the fact that we can calibrate from any sky signal. The same reasoning makes it unnecessary to specify an observing frequency 
for the simulations. 

In the plots, it is clear that there is some scatter in the recovered antenna parameters. This is due to non-zero instrumental 
noise, which was simulated to be Gaussian i.e. riij in Equation [2c] was taken to be a Gaussian random variable. Rather than 
using a specific noise model to fix the amplitude of the noise, we simply parameterize the problem in terms of the signal-to- 
noise ratio (SNR). Figures [2] and [s] show the results for an SNR of 10. (Such an SNR would be typical for an interferometer 
with bandwidth Au ~ 10 kHz and integration time r « 10s observing Galactic synchrotron emission at 150 MHz). The scatter 
seen in these figures can be viewed as error bars in the final calibration parameters outputted from the calibration. To quantify 
these errors, we can compute the deviation e = x — x and form the quantity 

S = (ee') = [A*N-'A]"\ (18) 

where the last equality can be proved with a little algebra and Equation 13 ( Tegmark|1997 1. The square root of the diagonal 
elements of E then give us the expected error bars on the corresponding elements in x. For example, cti, the expected error 
in the calibration of the first antenna's r/, is given by (Sn)^'''^ if one arranges the elements of the parameter vector x in the 
manner shown in Equation [9] 

The fact that our system is linear means that to understand the errors in our calibration, we need only compute error 
bars from simulations of systems with an SNR of 1; the errors from systems of arbitrary SNR can be scaled accordingly. To 
see this, consider Equation |17c| which we can rewrite by scaling out some fiducial mean value |co| for the magnitude of the 
visibilities: 



N, 



a/3 = 



j3a/3 = 



Sal3 = 



(SNR)2 \c'^ 



~ (SNR)2 ' 

where we have defined normalized visibilities c'^ 
Equation [18] gives 



and a normalized noise covariance matrix N' 



(19a) 
(19b) 

Plugging this into 

(SNR)n-- -J • (20) 
We therefore see that in what follows, general conclusions for any SNR can be scaled from our simulations of a system with 
an SNR of 1. 

Although redundant calibration schemes make it possible to calibrate an interferometer using any reasonable sky signal, 
it is important to point out that the quality of the calibration, or in other words, the error bars predicted by Equation |20| are 
sky-dependent. To see this, consider a situation where the sky signal is dominated by a small number of Fourier modes whose 
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Figure 4. Expected error bars in recovered t] for diflferent antennas in an 18 by 18 square array with a signal-to- noise (SNR) ratio of 1, 
although in a realistic situation one would expect a more favorable SNR. 



wavevectors are of precisely the right magnitude and orientation for its power to be picked up by only the longest baselines of 
an array. Since our noise covariance matrix is proportional to l/|c|, using such a sky to calibrate our array will give small error 
bars for the antennas that are part of the longest baselines and relatively large error bars for all other antennas. On the other 
hand, a sky that has comparable power across all scales is likely to yield calibration error bars that are more consistent across 
different antennas. Given this dependence of calibration quality on sky signal, in what follows we compute ensemble-averaged 
error bars based on random realizations of our Gaussian skies to give a sense of the typically attainable accuracy. 

From our simulations, we find that the dependence of calibration quality on antenna location is quite weak. This can 
be seen from the scale on Figure |4] where we show the expected error bars (estimated by taking an average of 30 random 
realizations) in the recovered rj for the antennas that comprise a regularly spaced 18 by 18 square interferometer. Far more 
important than antenna location in determining the calibration errors is the total number of antennas in an array. In Figure|5j 
we show the averag^error in ?7 as a function of the number of antennas in a square array. The error E^''^ roughly asymptotes 
to a 1/y/N dependence, where A'^ is the total number of antennas in the array, as would be expected if the calibration algorithm 
is thought of as combining independent information from different redundant baselines. Precisely the same trend is seen in 
the (f errors. 

In running the simulations, we find that minor variations in the baseline distribution have little effect on the final 
calibration errorij^ For instance, varying the aspect ratio of a 100 antenna system from a 10 by 10 array, to a 5 by 20, 4 by 
25, and a 1 by 100 array has mostly minimal effects on the rj and ip errors. Thus, for an array with baseline redundancies that 
are at roughly the same level as for a square array, we can put everything together and arrive at the following rule of thumb 
for the calibration errors: 

A, = E,y^«A^ = Ey^«^4=, (21) 



where A'^ is again the total number of antennas in the array. 

Not captured by Equation [21] is the fact that antennas closer to the center of an array tend to have lower calibration 
errors than those that are peripherally located, as can be seen in Figure |4] Intuitively, this is due to the fact that while every 
antenna forms the same number of baselines (because each antenna forms a baseline with all other antennas in the array), 
antennas that are located in different parts of the array form different sets of baselines. An antenna that is in the corner of 
a square array, for example, forms A'^ — 1 baselines that are all different from each other, whereas an antenna that is close to 
the center will form baselines in redundant pairs because of the symmetry of the square array. In the system of equations, the 
centrally located antenna will therefore be more tightly constrained by the data, and the resulting error bars in its calibration 



* Averaging both over different antennas in an array and over different simulations. 

^ Major variations, of course, can drastically affect the errors. An extreme example of this would be a change in baseline distribution 
that completely destroys all redundancy, which would render the algorithm unusable and formally give rise to infinite error bars. 
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Figure 5. Expected average error bars in recovered r; as a function of the size of the simulated square array. All simulations have a 
signal-to-noise (SNR) ratio of 1, although in a realistic situation one would expect a more favorable SNR. 

parameters will be smaller. These differences are, however, rather minimal and are unlikely to significantly affect the actual 
calibration of a real (non-simulated) array. 

As mentioned above, in addition to the antenna calibration parameters 77 and tp, redundant calibration algorithms yield 
estimates yij for the true sky visibilities. In Figures §[7] and[8] we show the visibility results that fall out of the calibration. 
Figure [6] serves as a legend for the uv plane, with different colors and symbols signifying the different unique baselines found in 
a 4 by 4 square array. These symbols are used to plot the uncalibrated input correlations measured by the interferometer (i.e. 
the Cij's) on a complex plane in Figure [t] Each color and symbol combination may appear multiple times in Figure [7| because 
two baselines that are identical in principle may give different readings in practice because of instrumental noise and the fact 
that different antennas will have different calibration parameters. Indeed, one can see from Figure [7] that the uncalibrated 
correlations are rather spread out over the complex plane, and do not seem close to the simulated true sky visibilities, which 
are denoted in the figure by solid magenta dots. 

After calibration, however, one has estimates for the antenna calibration parameters, and thus these uncalibrated cor- 
relations can be corrected to yield measurements that are close to the correct sky visibilities. One simply divides out the 
complex gain factors, since the measured correlations cij are related to the true sky visibilities by Cij = g*gjyi-j. The 
results are shown in Figure [Sj where it is clear from the color and symbol combinations that identical baselines now yield 
measurements that cluster around the simulated true values, which are still given by the solid magenta dots. The scatter that 
remains within a given set of identical baselines is due to instrumental noise (set at a level corresponding to an SNR of 10 
for this simulation). It should be noted, however, that while this is a perfectly workable method for computing estimates of 
the sky visibilities, it is unnecessary. Redundant calibration outputs a parameter vector estimate x (see Equation |13[ ) that 
contains the true visibilities. These are shown in Figure [8] using magenta x's, and emerge from the calibration at no additional 
computational coslf^ 

Since the true visibilities (like the antenna gain parameters) are simply components of the parameter vector x, we can 
again use the machinery of Equation [20] to compute the expected error bars in the estimated visibilities. The results are 
similar to what was found for the antenna calibration parameters, and are summarized in Figure [9] The errors are agam 
found to asymptote to a 1/ y/N dependence, but here refers to the number of redundant baselines that go into determining 
a given visibility, and not the total number of antennas in the array. That the error bars do not depend on the total number 
of antennas is clear from Figure |9] where data points from simulations of arrays of various sizes lie on top of each other. It 
is also a property that is intuitively expected, since adding extra antennas increases the mathematical constraints on a given 
visibility only if the extra antennas form new baselines that correspond to that visibility's location on the uv plane. 

In fact, it is not even strictly valid to consider any "extra" computational cost associated with solving for the yij's, for as one can 
see from Equation [9| the inclusion of the yij's is necessary for redundant calibration to work. 
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Figure 6. Legend of baselines for Figures [T] and [s] 
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Figure 7. Uncalibrated visibility measurements plotted on the complex plane according to the uv plane baseline legend provided by 
FigurelG] The simulated {i.e. true) sky visibilities are given by the magenta dots. 



2.6 Problems with the logarithmic implementation 

While our simulations and real- world applications like those detailed in Noordam & de Bruyn ( 1982 1; Ishiguro ( 1974 1; Ramesh 



et al.||T999l ); |Wieringa| ( |1992| ; |Yang| ( |1988| l; [Pearson fc Readhead] ( |1984 ) have shown above that taking logarithms of Equation 
3] yields a linear system that can be successfully used to fit for calibration parameters and visibilities, this logarithmic 
implementation is not without its drawbacks. In particular, the implementation has two undesirable properties: 
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Figure 8. Same as Figure [7] except the visibilities have been calibrated. The simulated sky visibilities are again given by the magenta 
dots, while the best estimates from the calibration algorithm for these visibilities are given by the magenta x's. 
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Figure 9. Expected error bars on the estimated visibilities, as a function of the number of redundant baselines that go into determining 
each visibility. A signal-to-noisc (SNR) ratio of 1 is assumed, although in a realistic situation one would expect a more favorable SNR. 



2.6.1 Phase Wrapping 

For the logarithmic implementation to work, the phase calibration parameter <^ needs to be close to zero for all antennas. 
This is because one must take the complex logarithm of Cij — g*gjyi-j, which is an operation that is determined only up to 
additions or subtractions of multiples of 2-k in the imaginary part {i.e. in the phase of the original number). Thus, while the 
calibration solution may correctly recover the measured correlations (since one is insensitive to additions or subtractions of 
2n in the phase when re-exponentiating to find Cij — g*gjyi-j), it does not give correct phases for g* , gj, and yi-j, which 
are ultimately the quantities of interest. This is a problem even when only a small number of antennas have large phase 
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Figure 10. Scatter plots of simulated vs. recovered antenna phase parameter ip for a noiseless 4 by 4 square array with relatively large 
phases. The results from the logarithmic implementation described in Section [23] are shown using open blue circles, whereas the results 
from the linear implementation described in Section |2.7| arc shown using solid red circles. Large phases are seen to affect the accuracy of 
the logarithmic method, but not the linear method. 



calibration parameters, since to find the visibilities and antenna gain parameters redundant algorithms essentially performs a 
global fit, where all antennas are coupled to each other via the visibilities. 

In Figure [lO] we sliow the phase calibration results for a noiseless simulation of a 4 by 4 array with phases that are 
significantly greater than zero. The logarithmic method is shown using the open blue circles, and since a noiseless simulation 
should yield perfect parameter recovery, the scatter in the trend suggests a problem with the method. A detailed examination 
of the simulation's numerical results reveals that it is indeed the multi- valued property of the complex logarithm that causes 
the trouble, and from the plot one also sees that a small number of very badly fit antennas can give rise to inaccuracies for 
all antennas, as is expected from the global nature of the fit. 

It is important to note that even when the phase calibration fails because of large phases, the amplitude calibration can 
still be implemented successfully. This is because the taking of the complex logarithm means that in our system of equations, 
the amplitudes and phases separate into the real and imaginary parts respectively, as we can see from Equations |8a| and |8b[ 
The two calibrations are therefore completely decoupled, and instabilities in one do not affect the other. The fact that an 
accurate gain calibration can be extracted regardless of the quality of one's phase calibration will be important in Section [Ts] 



2.6.2 Bias 

The logarithmic method is not unbiased, in the sense that ensemble averages of noisy simulations do not converge to the true 



simulated parameter values. While Equation 13 can be shown to be unbiased (Tegmark 19971, the proof assumes that the 



noise covariance matrix N is Gaussian and that the weights used in the fits (encoded by N~^) are independent of the data. 
In our case both assumptions are violated. While we may assume that and Uy in Equation 1 1 7b| are Gaussian distributed, 
this is not the case in amplitude/phase angle space. In addition, since the diagonal entries of N are taken to be l/|c|, the 
measured data enter the least squares fit (Equation 131 not just through the data vector d but also via N~^, making the 
fitting process a nonlinear one. 

The bias in the method can be easily seen in Figure [TT| where the blue dots show the simulated results for t], averaged 
over 90 random ensemble realizations of a 4 by 4 array with an SNR of 2. There is clearly a systematic bias in the recovered 
output ri's. 



2.7 The Linearized Approach 

Since both of the problems discussed in the previous section arise at least partly from taking the logarithm of Equation |6] 
we now propose a method that does not require that step. As an alternative to taking logarithms, one can instead linearize 
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Figure 11. Scatter plots of simulated vs. ensemble averaged recovered antenna phase parameter Lp for a 4 by 4 square array with a 
signal-to-noise ratio of 2. The results from the logarithmic implementation described in Section |2.3| are shown using open blue circles, 
whereas the results from the linear implementation described in Section |2.7| are shown using solid red circles. The logarithmic algorithm 
gives a biased result, whereas the linear implementation does not. 



the equations in Section [2.2[ We do so by Taylor expanding our expressions about some fiducial guesses ?7o, fo, and j/o for the 
parameters 77, and yo. The gain of a particular antenna, for example, becomes 



(22 



a 



dq 
or] 



{<p-<po) + ... (22b) 
170, ¥=0 

exp (r/o + ifo) [! + (??- no) + i{f - fo)] + ■■■ (22c) 



Along with our fiducial guess for the true correlations (the y's), we define guesses for our measured correlations: 

_ / -On / , ■ 0\ /no^ 

Cij = Vi-j exp [rji ~ vpi ) exp [r/j + i<pj) . (23) 

In addition, we define the deviation 5ij of the guessed correlations from the measured correlations 

5,, = - 4 (24) 

as well as an analogous quantity for the true sky correlations 

y]_j=yi-j-y°_j. (25) 

To calibrate our telescope, we take the measured deviations 5ij as our inputs and compute the corrections to the guessed 
antenna parameters and guessed true sky correlations that are required to match these deviations. Plugging our definitions 
and expansions into Cij = gtgjyi-j gives 

Sij ^ exp [rfi - ii/l) exp (r?" + iip°) [yl_j + y°_j {Arn + Arjj - iAipt + iAtpj)] (26) 

where we have defined A77 = 77 — r;o, Aip = ip~ipo, and have discarded second order terms in the small quantities Arj, Aip, and 
. The result is a linear system in At], Aip, and y^ , which means we can use the matrix formalism presented in the previous 
section, the only differences being that the 77 and equations are now coupled, and that the matrix analogous to the A used 
in the logarithmic method — which we will call B for the linearized method — now depends on the fiducial guesses as well 
as the array layout. A least-squares fit can once again be employed to solve for the Ar;'s, the Ay's, and yl-j- Once the fit 
has been obtained, it can be used to update our fiducial guesses, and if desired, one can continue to improve the quality of 
the calibration by repeating the algorithm iteratively — the updated calibration parameters are simply used as the fiducial 
guesses for the next cycle of fitting. Note that unlike before, it is unnecessary to use a weighted fit i.e. one can set N = I 



in Equation 13 This is because the linearized algorithm works with visibilities directly (as opposed to, say, their logarithm 
or some other function of the visibilities), and since our assumption of uncorrelated baseline errors means that individual 
baselines have the same noise level, all correlations should be weighted the same in the fitting. 

In Figure |12| we show plots of simulated and recovered visibilities on the complex plane as one steps through each 
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Figure 12. Visibility simulations, measurements, and estimated visibilities plotted on complex planes. The top left plane shows the 
simulated and measured visibilities before any calibration scheme has been applied. The top right plane shows the simulated visibilities, 
corrected visibilities, and best fit visibilities after one iteration. The bottom planes show the same quantities after two and four iterations 
respectively. 



iteration of the algorithm. The top left plane shows the simulated correlations as well as the measured correlations. After one 
iteration, we plot the corrected visibilities as well as the estimated true visibilities in the top right. The recovered visibilities 
are seen to be still quite different from the simulated ones. The bottom left and bottom right planes show the second and 
fourth iterations respectively, and one can see that the recovered visibilities converge toward their simulated values. 

Scatter plots showing the correlation between the simulated and estimated antenna calibration parameters are qualita- 
tively identical to Figures [2] and [3] for the logarithmic implementation, so we do not show them here. However, the linearized 
approach solves the two problems with the logarithmic algorithm discussed in Section |2.6| In Figure |10[ the filled red circles 
representing the linearized method show a perfect recovery of simulated phases in the noiseless limit. This demonstrates that 
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the linearized algorithm can tolerate arbitrarily large phases, which the logarithmic method could not. Figure [TT] shows that 
the ensemble-averaged parameter estimates coming out of the linearized method converge to their true values, demonstrating 
that the linearized algorithm is unbiased. That this is the case is particularly important if one applies the algorithm to highly 
noisy systems, for then one can compensate for the low SNR by using massive arrays with large numbers of baselines and by 
averaging the results over long time periodj^ 



2.8 Numerical issues for the linearized approach 

Although the linearized algorithm evades the problems with the logarithmic approach, it is slightly more complicated to put 
into practice. Below we discuss two problems that may arise in the numerical implementation of the method and how to solve 
them. 



2.8.1 Convergence 

Bad fiducial guesses can result in misfits, although such fits can be easily identified and improved upon. Given that our basic 
measurement equation (Equation |3| is nonlinear in our parameters, the linearization that we have presented in this section 
can only be expected to stably yield good fits when our fiducial guesses are reasonably close to their true values. In a naive 
implementation of the algorithm, bad fiducial guesses can result in an inability to iterate out of a local maximum in goodness- 
of-fit. However, bad fits can be readily identified by simply computing the goodness-of-fit parameter, and if necessary, one 
can repeat the fit with a better search algorithm such as simulated annealing to find the correct maximum. 

Another way to avoid misfits is to use the results of the logarithmic algorithm as our initial guess for the linearized method. 
Even though the phase calibration has been shown to be unreliable when the phases are large, the amplitude calibration can 
still be trusted since it comes from solving a completely separate system of equations. The fact that the amplitudes estimated 
using the logarithmic method are biased is not an issue, since our fiducial guess serves merely as a starting point for our 
iterative linearized scheme. 

An attractive approach to implementing our method in practice could be to update the calibration on the timescale over 
which the calibration parameters are expected to drift by non-negligible amounts (say once per second or once per minute), 
using the previously determined calibration parameters as the initial guess and performing merely a single iteration each 
time. Avoiding multiple iterations in this fashion has the advantage that the noise from any given observation affects the 
output only linearly, thus avoiding nonlinear mappings that can potentially bias the results. The optimal duration between 
calibrations should be determined to minize the expected errors: if we calibrate too frequently, the calibration parameters 
get unnecessarily noisy because so little data is used to determine them, while if we calibrate too rarely, the true calibration 
parameters can drift appreciably before we recalibrate. An alternative way to determine this tradeoff is to re-estimate the 
calibration parameters more frequently than needed and then smooth the time series, each time replacing the calibration 
parameter vector actually used by p times the new estimate and (1 — p) times the last estimate, and determining which choice 
for p £ (0, 1) minimizes the rms calibration errors. 



2.8.2 Computational Cost 

The linear method is more computationally intensive than the logarithmic one, especially if more than one iteration is 
required. Ifowever, the computational cost is still acceptable if one takes advantage of the sparseness of B. At first sight, 
the setting of N = I in Equation [13] for the linearized method but not for the logarithmic method would seem to make the 
linear method computationally quicker. However, the presence of a non-trivial N in the logarithmic method does not increase 
the computational cost, for in the absence of cross-talk N is diagonal. The computational cost of its inversion is therefore 
negligible, and its presence in Equation [13] simply changes the weighting of matrix elements when finding matrix products. 

That the linear method is slightly more computationally intensive than the logarithmic method is due to the fact that 
the matrices for the linear method are larger than those in the logarithmic method. This is because the amplitude and 
phase calibrations do not decouple in the linearized scheme, and all the parameters must be solved for in one big system. In 
particular, all vectors are now twice as long as before, since a system where gains and phases are coupled is mathematically 
equivalent to one where the real and imaginary parts of all complex numbers are dealt with together. This has its greatest 
computational impact on the matrix inversion of B*B, which can scale as strongly as the vector length cubed if one simply 
implements the most straightforward matrix inversion routines. Since this inversion must be performed every time the fiducial 
values (which are part of B) are updated, the computational time also scales linearly with the number of iterations, and so 
naively the linearized method seems rather expensive. However, the matrix B is by construction always sparse, which means 



That is, provided any natural drift in the calibration parameters occur over longer timescales than the required averaging time. 
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the matrix inversion can be performed more efficiently. As an example, one may use the conjugate gradient method, where 
the sparseness of B means that the inversion scales linearly with vector length, improving the numerical cost scaling from 
0(iV3) to 0{N). This is very similar to the mapmaking approach successfully implemented by the WMAP team in |Hinshaw| 



et al. (20031. 



It should also be noted that if only one iteration is required and the gain parameters drift only by small amounts over time, 
then the linearized method can be extremely efficient over long integrations. This is because a series of calibrations can then 
be performed using the same fiducial values for the antenna parameters, and thus the matrix [B*B]^"'^B* never changes and 
can be computed once and for alj^ One then simply stores this matrix, which is applied to the data whenever calibration is 
required. Note that this cannot be done with the logarithmic approach, since the relevant matrix there is [A^N^""" A]^''"A*N~"'^ 



and from Equation 17c we see that N changes as the measured correlations change, so the combination [A'N^""" A]^''^A*N^"'^ 
must be recomputed every time one would like to calibrate. 

If an instrument with A*' antennas possesses a very large number of redundant baselines (like any reasonably large 
omniscope, say, where almost all of the ~ baselines are redundant). Figure [o] suggests that the errors will be extremely 

small, especially if the signal-to-noise ratio is high. With such an instrument, one will be able to calibrate accurately by utilizing 
merely a small subset of the redundant baselines; for example, if each antenna is correlated with merely 20 others, the resulting 
system of equations will still be safely overdetermined while keeping the size of the B-matrix and the computational cost 
reasonable. This is important for any array utilizing fast Fourier transforms for the correlation process, since its attractive 
A^log2 A'^ cost scaling would be lost if an 0{N'^) correlator were needed for calibration. 

In summary, our proposed calibration method can be performed rather cheaply, especially when one compares the com- 
putational cost to that of initially computing the interferometric correlations. To see this, note that one must compute 
correlations roughly once every 10~* seconds for an interferometer operating at a frequency of ~ 100 MHz. This is much more 
expensive than the calibration step (especially if one incorporates the computational tricks suggested in this section), which 
only needs to be performed, say, once every second or minute. In other words, the correlation needs to be repeated on the 
timescale on which the electromagnetic waves oscillate whereas the calibration only needs to be repeated the time scale on 
which the calibration parameters drift. 



3 A GENERALIZED CALIBRATION FORMALISM: CALIBRATION AS A PERTURBATIVE 
EXPANSION 

In previous sections, the calibration algorithms that were discussed represented two idealized limits: with the bright point 



source calibration (Section 2.1 1, the existence of an isolated bright point source was key, while with redundant baseline 
algorithms (Section 2.2 1 a large number of baselines of equal length and orientation were required. Both algorithms were 
"zeroth order algorithms" in that they required exact adherence to these requirements. In this section, we perturb around 
these idealized assumptions, and show that point source calibration and redundant calibration are simply two extremes in a 
generalized continuum of calibration schemes. 

Consider the sky response {i.e. the visibility) from a baseline b formed by two antennas at locations and Vj {i.e. 
h = Ti — fj). Assuming that our array is coplanar, we have 



J{e)B{0) f i27rb-0 



where J{0) is the intensity of the sky signal from the direction, and B{6) is the primary beam. Suppose we now define an 
effective sky signal I{0) = -^E^ffij. Writing th is in terms of its Fourier space description 

I{d)= [ /(k)exp(i27rk- 0)d^k (28) 



c(b) = gtg.I I ^) , (29) 



and inserting this into Equation |27[ we get 

r(h^ = n*n J ( ^ 

which is a well-known result: baselines of an interferometer probe different Fourier modes of the effective sky signal. 

Now suppose that our interferometer possesses a number of nearly-redundant baselines. In other words, suppose that there 
exist a set of vectors bg = {bj, bp, . . . } on the uv plane that baselines tend to cluster around. Nearly-redundant baselines are 
considered part of the same cluster, and are associated with a single bo . On the other hand, any baselines that are isolated on 
the uv plane are associated with "their own" bp . Thus, for an array with A*' antennas the index a runs from 1 to N{N ~ l)/2 

In principle, one also requires that transients are not present in the data, since transients may cause sudden changes in the visibilities 
yij and thus one cannot assume that the same fiducial guesses are good ones over all time. In practice, however, transients can be easily 
identified in the data, and one can simply ignore the data taken over time intervals where transients dominate. 
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if there are zero near or exact redundancies in the baselines, and to some number substantially smaller than N{N — l)/2 if 
there are many near or exact redundancies. In other words, we have grouped the baselines into nearly-redundant clusters. 

We now re-parameterize our baseline distribution in terms of Shij = hij — bp , which measures each baseline's deviation 
from perfect redundancy. Since Shij is a small quantity, we can Taylor expand Equation |29| 

Cij = gi9jl[ -j (30a) 



gi9j 



b=b" 



(30b) 



where Vu denotes a two-dimensional gradient in the uv plane. If we define Cq = I (^~x~^ i ^6 can rewrite our equation as 

Cij ~ gtgjco ^H- ho • + . . . ^ , (31) 

where hS^ = V In/ 

For algebraic brevity, we proceed using the logarithmic method of Section [2.3| the generalization to the linearized method 
being straightforward. Taking logarithms of both sides yields 



In Cij ^ (j7i +rij) + i(<pj — ipi) + In Cq + In ^1 -|- ho 



~ (jyi +%) + i('y3j - ¥^0 +lnco + ho ■ ^y^, (32) 

where we have assumed that ho ■ is small in order to expand the last logarithm. 

Before we proceed, it is worth examining the precise conditions under which our Taylor serie s ex pansions are valid. The 
crucial quantity is hg • which will be small if either hJJ" or are small. Plugging Equation 
we have 

h^^^-.2J;^W-P/-^--")f , (33) 
/ /J(0)exp(-i27ru- 0)d20 ' ^ ' 



28 



into our definition of h, 



which is a quantity whose magnitude is at most the size of the primary beam in radians. This dependence on the primary 



beam means that ho • can be made to be small in different ways depending on the instrument one is considering: 



(i) For widefield instruments, the primary beam width is on the order of 1 radian, so one requires \Shij \ <C A. 

(ii) For instruments with narrow primary beams such as the VLA, we have jhj ^ 1, so the deviations Shij from perfect 
redundancies need not be small compared to the wavelength. 

If the conditions listed above are satisfied, then calibrating our radio telescope is similar to the zeroth-order case, in 
the sense that it is once again tantamount to solving Equation |32[ Here, however, we are not guaranteed to have enough 
equations to solve for all the relevant variables. In addition to the 2N antenna calibration parameters (r;'s and i^'s), we now 
potentially have up to 3N{N — 1) numbers to solve for — the Inco term contributes up to N{N — 1) numbers, since as we 
discussed above a can run to A'^(A'^ — l)/2 in a worst-case scenario, and specifying a single Incg requires two numbers: a real 
part and an imaginary part; the hg term contributes up to 2N{N — 1) numbers — twice that required to specify the Inco 
terms, because ho is the logarithmic gradient of Cq in the uv plane, which is two dimensional. The 5bij's do not need to be 
solved for since they can be computed from the layout of the array. Putting this all together, we see that in general there can 
be up to 3N{N — 1) unknowns. 

To properly solve Equation |32| one must therefore find ways to reduce the range of the index a so that the number of 
equations exceeds the number of unknowns. In the next few subsections, we will consider different scenarios in which this is 
the case. 



3.1 Zeroth Order: Perfect Point Sources or Perfect Redundancies 



The zeroth order solution to Equation 32 involves setting up situations where the ho • term can be set to zero. There are 
two ways to accomplish this: 

(i) Design an array with perfect redundancies, so that Shij = 0. The form of h" is then irrelevant (which is another 
way of saying that we can calibrate with any sky signal), and the only requirement is for the number of unique baselines to 
be small enough that the maximum possible a is modest and there are only a small number of Cq terms to fit for. This is 
precisely the redundant calibration algorithms discussed in Sections |2.2| to 
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(ii) Have some knowledge of the sky signal used for calibration, so that Cq and hp are known. For example, if 
one uses a phased array to image a point calibrator source, then Cq becomes a real constant independent of a, while the 's 
all vanish. This is the point source calibration discussed in Section [2. 1[ 



3.2 First Order: Unknown Sources and Near Perfect Redundancies 



If one is unable to satisfy the conditions listed in Section |3.1[ it may still be possible to calibrate by solving for Cq and hg as 
well as all the calibration parameters simultaneously. For example, taking the real part of Equation |32| gives 



In jc,j| = {r/i + rij) + In \co \ + Re(fto„)- 



A 



+ Re(fto".)^: 



(34) 



which can be written as a matrix system: 
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where for illustrative purposes we have chosen a nine-antenna system with the first three baselines almost-redundant. This set 
of equations can again be solved using Equation [13] provided the design of the interferometer is such that the baselines can be 
grouped into a relatively small number of near-redundant clusters. Fundamentally, the method here is the same as it has been in 
all previous sections — we have simply written the interferometer response as a linear function of the calibration parameters 
and the sky signal, and have found ways to reduce the number of parameters (whether by mathematical approximation, 
interferometer design, or careful selection of calibration source) so that the equations are solvable. In the first order expansion 
considered in this section, if r denotes the number of bo 's with two or more baselines clustered around them and I denotes 
the number of completely isolated baselines, the calibration can be solved for provided our array satisfies 

N{N-1) 



> N + 3r + l, 



(36) 



where as usual TV is the number of antenna elements in the array. The left hand side is simply the number of measured complex 
correlations, while the right hand side counts the number of complex unknowns we need to solve for: A'' complex antenna 
gains, r complex visibilities at each cluster center, 2r complex numbers quantifying deviations in the complex visibilities due to 
departures from perfect redundancy in the two directions of the uv plane, and / complex visibilities for the isolated baselines. 
For the s x s regular square arrays simulated in Section [2j the condition is satisfied if s 4. 

Like before, however, one must be cognizant of possible degeneracies in the matrix system. In particular, consider a cluster 
comprising of one or two baselines. In fitting for this cluster's visibility, we have one or two inputs (i.e. correlations) but need 
to solve for three parameters: Cq, Iiq^, and h^^. Our matrix system thus becomes singular. The same problem arises if either 
Sb^j or Sbij are identical for a large number of baselines in a particular cluster. For instance, if all but one of the baselines in a 
cluster are perturbed in precisely the same way, then as far as the cluster goes we effectively only have two different baselines 
and cannot fit for three parameters. These degeneracies can be dealt with in a similar fashion as before: one simply imposes 
extra constraints on the system of equations, like requiring that /iqu a-nd hg^ be zero for single-baseline clusters. 

Simulation results for a 4 x 4 square array with Gaussian antenna position errors are shown in Figure |13| where we 
show the root mean square errors in the recovered gains Ai] as a function of the spread at in the Gaussian from which the 
position errors are drawn. The simulations were run with precisely the same parameters as those in Section[2] except with zero 
instrumental noise, so any non-zero error is due purely to position errors {i.e. to the non-perfect redundancy of baselines). 
Plotted using the solid circles are the results that one obtains if one simply ignores the fact that the baselines are not perfectly 
redundant i.e. if one applies a zeroth order algorithm based on Equation [6] The errors are clearly linear in ai,. Correcting for 
baseline errors to first order by implementing an algorithm based on Equation |31| gives errors that are smaller and quadratic 
in at. It is also clear from the plot that the wider the primary beam of an instrument, the greater the errors in the calibration. 
Formally, this is because the beam width is directly proportional to the size of the linear baseline error correction term, as 
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Figure 13. R.m.s. errors in recovered rj for one antenna in a noiseless 4x4 square array as a function of antenna position spread cr;,. 
Shown in red dashed lines are the results for a primary beam with width 1°, while the results for a primary beam with width 2° are 
shown using the blue solid lines. The solid circles denote results from an algorithm that does not correct for baseline position errors, and 
the stars denote results from an algorithm that corrects these errors to first order. 



we noted above. This result can also be understood intuitively by considering the effect that a primary beam has on uv 
visibilities. A narrow primary beam acts as a wide convolution kernel on the uv plane, which effectively "averages" together 
visibilities over a wide neighborhood. This smoothes the visibilities on the uv plane, making the errors incurred by baseline 
errors more amenable to a perturbative correction. 

In principle, there is no need to stop at the first order correction. One could, for instance, correct baseline errors to second 
order by using the following equation as the basic measurement equation: 

Crj ~ g^ 9jCo ( 1 + ho ■ ^ + -j^ ■ Jo ■ . . . 1 , (37) 

where ho = V In 7 as before, and jo is a symmetric matrix of second derivatives of / evaluated at bo and normalized 

by J^-^y-j. It is important to note, however, the somewhat counterintuitive fact that for a realistic array {i.e. one with 
instrumental noise), the errors in the recovered gain parameters may increase as one corrects to increasing order in baseline 
position error Shij. This is because the higher the order that one corrects to, the greater the number of parameters that one 
must solve for. The number of correlation measurements, however, remains the same as before (with N{N — l)/2 of them), 
so the system of equations becomes less constrained. This makes the calibration algorithm more susceptible to noise, which 
can negate the benefit that one obtains by correcting baseline position errors to higher order. Put another way, when we 
introduce a large number of parameters in an attempt to correct for baseline position errors, we run the risk of over-fitting 
the instrumental noise instead of averaging it down using redundant baselines. 

In running the baseline error simulations, we find the errors in recovered to be much worse than those in the recovered 
77's. This is likely due to the fact that a small perturbation in antenna position can result in large changes in phases, which 



(as explained in Section 2.6 1 can cause problems in a logarithmic implementation. 

The reader should also note that because quantities like hg are dependent on the sky signal, we find that the calibration 
errors induced by baseline position errors cannot be easily captured by simple formulae analogous to Equation |21[ An extreme 
example of this was already discussed above in Section [3.11 where we saw that the correction terms for baseline errors approach 
zero as the sky becomes increasingly dominated by a single bright point source. The values on the vertical axis of Figure [13] 
are thus not generically applicable; in general, one must simulate the relevant array arrangement and sky signal to estimate 
the errors incurred by antenna position errors. 
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3.3 First Order: Non-coplanar Arrays 

The methods described above can also be used to calibrate radio interferometers even if there are slight deviations from 
planarity. Suppose the antenna elements of our array are all almost (but not quite) located at a height of 2 = 0. The slight 
deviations from perfect coplanarity of the antenna elements result in slight deviations from coplanarity of the baselines, which 
we denote Sb],^ . With these deviations, Equation 



27 



becomes 



Like before, this equation can be manipulated into a form conducive to calibration if one (or both) of the following conditions 
are met: 

ei 



(i) Narrow primary beam. With a narrow primary beam, one has ^/l ^ S'x — 6'y ~1 f- The part of the 

expression corresponding to the non-coplanar correction thus factors out of the integral, giving 

- (-f »y) [j.^mBn-- - - • 

where we have adopted the notation defined in Equation [28] This illustrates the fact that so long as the primary beam is 
narrow, deviations from coplanarity simply results in phase shifts in the measured correlations. Aside from these phase shifts. 
Equation [39] looks completely identical to Equation |29| and thus the calibration of a non-coplanar array is no harder than the 
calibration of a planar array when the primary beam is narrow. One simply defines adjusted correlations c = cexp (— i^iSb*-') 
and the ensuing analysis proceeds like before, with no additional computational cost. 

(ii) Near coplanar array. If the array is close to coplanar {5b^i ^ 1) , the calibration can still be performed, but this time 
at a slight increase in computational cost. With near coplanarity, Equation [38] can be Taylor expanded in 5b^: 

= 9:9^I(^~^9:9,^-^5b^^K(^, (41) 



where I is defined as before in Equations 27 and 28 and 



k{\i) = j j{e)B{e)exp{-i2n\i-e)<fe. (42) 

Perturbing this equation around perfectly redundant baselines using exactly the same methods as those employed between 
Equations |30a| and |31[ one obtains 



c(b) = g'gj 



Co ( 1 + ho • ] - i27r — do 



(43) 



where do = ^ (^"A^j- Note that whereas / is expanded to first order in Sh, we are only required to keep the zeroth order 
term for K since any linear term in 5b would be multiplied by Sb^, and end up a second order term. Proceeding as before, the 
analog of Equation |32| takes the form 

Inc^j ^ {rji + r/j) + i{tpj - ipi) + Iucq + ho ■ + i2Tiqo—^, (44) 

where = ln(do/co). From here, we can once again form a linear system of equations that can be solved to yield the 
calibration parameters, the only difference being that we must solve for yet one more parameter per cluster of baselines. 



4 CONCLUSIONS 

Redundant calibration schemes calibrate radio interferometers by taking advantage of the fact that — once calibrated — 
redundant sets of baselines should yield identical visibility measurements regardless of what the sky looks like. In this paper 
we have performed simulations to examine the error properties of redundant calibration, and have found that the usual 
logarithmic implementations are statistically biased. The linearized implementation introduced in Section [2.7| on the other 
hand, is both unbiased and computationally feasible, being far less computationally intensive than the standard initial step 
of computing the signal correlations. 

Though for many of our simulations we deliberately chose unfavorable SNRs to exaggerate the effects of instrumental 
noise, the errors are seen to scale inversely with SNR, which means that Equation |2l] and Figure [9] predict that redundant 
baseline calibration can yield high precision calibration parameters in a realistic application. Moreover, the fact that the 
linearized method of Section |2.7| is unbiased means that even if the SNR were unfavorable, one could simply build larger 
arrays and average over long time periods, in principle suppressing calibration errors to arbitrarily low levels. 
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We have also shown that both point source cahbration and redundant caUbration can be considered special cases within 
a generalized framework of algorithms, all of which find ways to reduce the number of calibration and visibility parameters so 
that there are few enough of them to bo solved for using the N{N — l)/2 measurement equations. This can be done either by 
making a priori assumptions about the calibration sources, or by taking advantage of baseline redundancy. If one has many 
more constraints than unknowns, non-exact redundancy and non-coplanarity can also be taken into account, which may be 
essential as the precision requirements for calibration become more and more stringent. That redundant baseline calibration 
appears able to satisfy such requirements while being computationally feasible is encouraging for large radio arrays, suggesting 
that calibration will not limit the vast scientific potential of 21 cm tomography. 
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